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Abstract —Echo state networks (ESN), a type of reservoir 
computing (RC) architecture, are efficient and accurate artificial 
neural systems for time series processing and learning. An ESN 
consists of a core of recurrent neural networks, called a reservoir, 
with a small number of tunable parameters to generate a high¬ 
dimensional representation of an input, and a readout layer which 
is easily trained using regression to produce a desired output from 
the reservoir states. Certain computational tasks involve real¬ 
time calculation of high-order time correlations, which requires 
nonlinear transformation either in the reservoir or the readout 
layer. Traditional ESN employs a reservoir with sigmoid or tanh 
function neurons. In contrast, some types of biological neurons 
obey response curves that can be described as a product unit 
rather than a sum and threshold. Inspired by this class of neurons, 
we introduce a RC architecture with a reservoir of product nodes 
for time series computation. We find that the product RC shows 
many properties of standard ESN such as short-term memory 
and nonlinear capacity. On standard benchmarks for chaotic 
prediction tasks, the product RC maintains the performance 
of a standard nonlinear ESN while being more amenable to 
mathematical analysis. Our study provides evidence that such 
networks are powerful in highly nonlinear tasks owing to high- 
order statistics generated by the recurrent product node reservoir. 


I. Introduction 

Understanding contextual information processing in the 
brain is one of the goals of neuroscience [l). Dominey et al. |2) 
proposed a simple model to explain the interaction between the 
prefrontal cortex, corticostriatal projections, and basal ganglia 
in context-dependent motor control of eyes. In this model, 
visual input drives stable activity in the prefrontal cortex, 
which is projected onto basal ganglia using learned interactions 
in the striatum. This model has also been used to explain 
higher-level cognitive tasks such as grammar comprehension 
in the brain ©. 

More abstract versions of this model. Liquid State Ma¬ 
chines 0 and Echo State Networks 0, (5), were later intro¬ 
duced in the neural network community and were subsequently 
unified under the name reservoir computing (RC) (6). In 
RC, a fixed high-dimensional recurrent network, called the 
reservoir, is driven by an input signal. An adaptive readout 
layer then combines the reservoir states to produce a desired 
output. Figure [ljprovides a conceptual illustration of RC. ESN 
implements this idea with a discrete-time recurrent network 
with linear or activation functions and a linear readout layer 
being trained using regression. Many variations of ESN exist 
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and have been successfully applied to engineering tasks such 
as time series prediction and system identification 0- 

Owing to fixed recurrent connections in ESN, its train¬ 
ing is much more efficient than ordinary recurrent neural 
networks (RNN), making it feasible to use their power in 
practical applications. ESN’s power in time series processing 
has been attributed to the reservoir’s memory 0, 0 and high¬ 
dimensional projection of the input, which acts like a temporal 
discriminant kernel m that is present in the critical dynamical 
regime, where input perturbations in the reservoir dynamics 
neither spread nor die out m-m- 

A major research direction in RC is to study how the 
choice of reservoir and readout layer architecture may improve 
the performance in different tasks |7). Recent insights into 
the nature of computation in ESN J8j, 0, ]14| show that 
the readout layer learns the temporal correlations between 
the reservoir dynamics and the desired output. Traditional 
tanh activation function in the reservoir creates nonlinear 
correlations that are challenging to characterize mathematically 
and may lead to unpredictable results 03 . 

Here, we propose that additive neurons with a thresholding 
transfer function can be replaced by multiplicative neurons 
and no additional nonlinearity. The use of product nodes 
in neural networks was introduced in 03 in an effort to 
learn the suitable high-order statistics for a given task. It has 
been reported that most synaptic interactions are multiplicative 
ED- Examples of such multiplicative scaling in visual cortex 
include gaze-dependent input modulation in parietal neurons 
03- modulation of neuronal response by attention in the 
V4 area EZJ and the MT area 03- In addition, locust 
visual collision avoidance mediated by LGMD neurons |20) , 
optomotor control in flies 121 j, |22j, and bam owl’s auditory 
localization in inferior colliculus (ICx) neurons can only be 
explained with multiplicative interactions [23) . 

Another popular architecture which uses product nodes is 
the ridge polynomial network J24) . In this architecture the 
learning algorithm iteratively adds groups of product nodes 
with integer weights to the network to compute polynomial 
functions of the inputs. This process continues until a desired 
error level is reached. The advantage of the product node with 
variable exponent over the ones used in polynomial networks 
is that instead of providing fixed integer power of inputs, the 
network can learn the individual exponents that can produce 
the required pattern [23). 



Fig. 1: Computation in an ESN. The reservoir is an excitable recurrent network with N readable output states represented by the 
vector X(f). The input signal u(f) is fed into one or more points i in the reservoir with a corresponding weight ft),, denoted by 
the weight column vector ft) = [ft),]. 


The main contribution of this work is to demonstrate the 
plausibility of product nodes for recurrent neural networks 
in the context of reservoir computing. In Section II-A we 


review the basic ESN architecture that we use as a performance 
baseline to study product RC. In Section II-B we describe 


the details of product nodes and some practical considerations 
for their use in a recurrent neural network. In Section |II-C[ 
we present our proposed architecture for reservoir computing 
using product nodes, specifically, the replacement of tanh- 
nodes in the ESN reservoir with product nodes and adjusting 
the initialization strategy accordingly. We show how to use the 
exponential-of-log trick to simulate the dynamics of product 
recurrent networks efficiently using ordinary matrix products. 
We also prove the echo state property for the network to show 
that the network dynamics is insensitive to initial conditions. 
The experimental study on information processing properties 
of product RCs is presented in Section m We first study the 
memory and also nonlinear memory capacity of product RCs, 
then we evaluate how such networks perform in predicting 
Mackey-Glass and Lorenz systems. Our results show that the 
product RC achieves performance similar to ESN with tanh 
functions. 


functions. The output is generated by the multiplication of a 
readout weight matrix 'P of length N +1 and the reservoir state 
vector x(t), extended by an optional constant 1, represented by 

y (t): 

y(t) = '¥x!(t). (2) 

The readout weights 'T need to be trained using a teacher 
input-output pair. A popular training technique is to use the 
pseudo-inverse method |6]. To use this method, one would 
drive the ESN with a teacher input and record the history 
of the reservoir states into a matrix X, where the columns 
correspond to the reservoir nodes and the rows are the states of 
each reservoir node in time. The corresponding teacher output 
will be denoted by the column vector y. The readout can be 
calculated as follows: 

'P=(XX , ) -1 <XY'), (3) 

where ' indicates the transpose of a matrix. Figure [2] show the 
architecture of ESN. We will compare these two architectures 
with our proposed product node ESN architecture. 


II. Model 

A. Echo State Network 

An ESN consists of an input-driven recurrent neural net¬ 
work, which acts as the reservoir, and a readout layer that reads 
the reservoir states and produces the output. Mathematically, 
the input driven reservoir is defined as follows. Let N be 
the size of the reservoir. We represent the time-dependent 
inputs as a column vector u(t), the reservoir state as a column 
vector x(t), and the output as a column vector y(t). The input 
connectivity is represented by the matrix ft) and the reservoir 
connectivity is represented by an N x N weight matrix Cl. For 
simplicity, we assume that we have one input signal and one 
output, but the notation can be extended to multiple inputs and 
outputs. The time evolution of the reservoir is given by: 

x(t +1) = f(Clx(t) + ft)w(f)). (1) 

where / is the transfer function of the reservoir nodes that 
is applied element-wise to its operand. An optional constant b 
can be added to the operand to serve as the bias to the reservoir 
nodes. The transfer function / is usually tanh or linear 


B. Feed-forward and Feedback Product Nodes 

One of the goals of neural network research is to discover 
how high-order interactions can be represented using simple 
nodes inspired by neurons. Following the observation of multi¬ 
plicative interactions in NDMA receptors at the level of a sin¬ 
gle neuron | j26) , Durbin and Rumelhart |l6j proposed product 
nodes in which different stimuli are raised to a power given 
by the respective synaptic weights and multiplied together. 
Single-neuron multiplicative interactions of this type have also 
been observed in owl ICx neurons (23]| and locust LGMD 
neurons [ [20) . Before giving a prescription for product RC, we 
briefly review the properties of a single product node with 
feed-forward and feedback connections. The original product 
node was defined as follows m- 

x = f(u“ l u^ 2 ...u“ N )=f{Uf =1 u“ i ), 

where x is the output of the product node, / is the activation 
function, m,- represent N different input stimuli, and ft), the 
corresponding weights on the inputs. We generalize this model 
to include a feedback connection by which the node can use 











multiplicative interaction with its input history to produce the 
output. This can be represented as follows: 

x(t) = f(x(t— 1 ) a u(t— 1 l)^ 2 l)**') 

= /(*(*-l)“n£ = 1 ii(f-l)?*), 

where £2 represents the weight of the feedback connection. 
Note that the multiplicative coupling imposes some additional 
constraints on the admissible ranges for £2, co, u. Namely, a 
zero value in x(t) and/or u(t) at any point in time forces a reset 
of the entire history in the node, killing the value of its short¬ 
term memory. Moreover, the memory of an entire network of 
product nodes will be erased if a single node becomes zero. 
To achieve short-term memory with multiplicative feedback 
nodes, we must choose the exponents such that the old inputs 
approach the value 1 and their effect diminishes over time. A 
possible choice is u(t) £ (0,1], £2 € (0,1], and ft) £ (0,1]. It 
is noteworthy that u(t) £ [—1,0) is an admissible input, but 
will result in complex values that could be interpreted as a 
mechanism for simultaneously encoding firing rate and phase 
information in a biological neuron [27]]. The complete analysis 
of the effect of such a behavior is beyond the scope of this 
work. Figure [3] illustrates the output values of a product node 
u m for positive and negative input domains and different input 
weights ft). 



reservoir state x(t) 

(a) nonlinear reservoir 

Fig. 2: Schematic of an ESN. A time-varying input signal 
u(f) derives a dynamical core called a reservoir. The states 
of the reservoir x(f) are combined linearly to produce the 
output y (t). The reservoir consists of N nodes. The input and 
the reservoir connections are given by the vector ft) and the 
matrix £2 respectively. The reservoir states and the constant are 
connected to the readout layer using the weight matrix *P. 

Another practical consideration is that as the feedback 
weight £2 approaches 1, the output of the product unit will 
approach 0 due to its long multiplicative history in the range 
(—1,1). This is similar to the saturating effect of a tanh 
function in a standard ESN. We found that for £2 > 0.8 the 
dynamics of a feedback node is not suitable for storing its 
input history. 

C. ESN Architecture with Recurrent Product Network 

We now consider the general case of a network with 
multiple nodes. For simplicity we use product nodes with linear 
activation function in the reservoir and a linear readout layer 
trained with ordinary regular regression. For very small input 




(a) « = 0.1 (b) u = —0.1 

Fig. 3: Example of the behavior of a product node with positive 
and negative input u for different input weights ft) £ [0,1]. The 
complex values may be interpreted as simultaneously encoding 
for firing rate and phase information. 


weights, tanh- ESN behave very similar to linear ESN; an 
appropriate combination of input weights scaling and reservoir 
bias b can map the inputs onto the nonlinear regions of the 
tanh function, which dramatically improves the performance 
of the ESN for nonlinear tasks (28 1). We will later show that 
despite linear activation, this architecture achieves a similar 
performance to ESN with tanh activations on standard bench¬ 
mark tasks. 

We use N coupled product nodes with linear activation 
to build a recurrent product network as our reservoir. The 
coupling is given by an N x N matrix £2 = [£2;,/], where £2 ( / - 
is the weight from node j to i. Each node also receives a 
connection from the input u{t). The input connectivity is given 
by the vector ft) = [ft);], where ft), is the weight from input 
to the reservoir node i. Without loss of generality we restrict 
ourselves to networks with one input and one output. The state 
of the reservoir at each time is given by the vector x(f) = [x,-], 
where x; is the node i. We assume both inputs and reservoir 
states are defined over compact sets. The time evolution of 
each node i is given by: 

Xi(t) = n N j=0 Xj{t - 1 )°u u (t - 1)“'. (4) 

The following proposition gives us a way to simulate the 
network dynamics using normal matrix product. 

Proposition 1. Global dynamics of the recurrent product 
network given by Equation [4] can be expressed as follows: 

x(f) = exp (£21ogx(f — 1) + ft)logw(f — 1)), (5) 

where the log and exp are applied element-wise. 

Proof: Recall that the dynamics of reservoir nodes is given 
by the following system: 

xi(t) =U I J =0 x j (t- l) a ^u(t- l) ffl| 
x 2 (t) = n y =oXj (t - lf*Ju(t - l)® 2 

x N (t) = nJ =0 Xj(t -1 )°»Mt - !)“"■ 








Taking the logarithm of both sides of each equation we have: 

N 

lOgXl(f) = 52 £il./log*;(/- l) + ©llogH(f- 1) 
j =0 

N 

logJf 2 (f) = £ Q.2j\0gXj{t - 1) + ®2l0gM(f - 1) 

;■=0 


N 

log x N (t ) = 52 &N,j log Xj (t - 1 ) + (0 N log u (t - 1). 
j =0 


This can be rewritten in compact matrix form as: 

logx(f) = £21ogx(f — 1) + co\ogu(t — 1). 

Finally, an element-wise exponentiation of both sides will give 
us: 

x(r) = e nio s x ( ; - 1 )+® 1 °g“( f - 1 ) 


Corollary 1. Given a recurrent product network with dynami¬ 
cal equation described by Equation [7] the state of the network 
at a given time can be explicitly written as a function of the 
initial state of the network and its input history as follows: 


t-l 


x(f) = exp I f2'logx(0) + 52 & ‘ X ( 0 \ogu{i)\ (6) 


Proof: This can be easily verified by expanding the 
recursion in Equation [5] 

x(l) = exp(f21ogx(0) + <»logw(0)), 
x(2) = exp(£21ogx(l) + <alogu(l)), 

= exp (fi 2 logx(0) +£ltalogu(0) + colog«(l)), 

( '-i 

x(f) =exp I f2'logx(0) + 52^ * X ©l°gM(0 

V «'=o 


Computation in ESNs is enabled by an important property 
which ensures that the reservoir state is asymptotically only 
a function of its input history. This is called the echo-state 
property (ESP). In |[29j two conditions were stipulated for a 
recurrent network given by a weight matrix Cl to hold the ESP: 
(1) a necessary condition that the spectral radius of fl should 
not be greater than unity; and (2) a sufficient condition that 
the largest singular value of Cl should be less than unity. Later, 
the sufficient condition was deemed too conservative and was 
updated |30) and the necessary condition was shown to be 
statistical y enough for a good reservoir ED- Yildiz et al. 03 
presented a pathological example to demonstrate that for an 
ESN with tanh functions neither of the conditions guarantees 
the ESP. However, this only holds for nonlinear systems and 
for a linear system the weight matrix spectral radius less than 
unity is enough to guarantee the ESP. The following corollary 
builds on Corollary [T] and gives us an equivalent of the ESP 
for recurrent product networks. 


Proposition 2. Given a recurrent product network described 
by Equation [7] the assumed compactness conditions on the 
inputs and the network state, and a recurrent weight matrix Cl 
with spectral radius A < 1, the asymptotic global dynamics of 
the network is only a function of the input history u(t). 


Proof: First, we note that the dynamics of logx(t) is linear. 
In addition, the unity vector 1 is the nullspace of the system 
fllogx(fo) and that Iim ; ^„Cl' logx(to) =0. Using Corollary [l] 
we can write the state of the system at time t —t °° as follows: 


t -1 


limx(f) = limexp I fl'logx(O) + 52^ f ‘ l CO\ogu(i) 


i =0 


ft -1 


= exp 52fi' ‘ I 0 Jlogt<(f) , 


which is a function of only the input history. 


We should point out that the derivation of ESP is usually 
presented in terms of asymptotic difference between the states 
of two identical ESNs driven by identical inputs that are 
initialized in different states, i.e., lim r _>oo ||xi(f) — x 2 (?) 11 = 0, 
where x\{t) and x 2 (f) refer to the long-term state of the ESN 
initialized with different random values. It is easy to see that 
this definition is equivalent to Proposition [2] We emphasize 
that since the systems dynamics is linear in the logarithm of 
the reservoir states and the unity vector 1 is the global attractor. 
Proposition [2] constitutes a necessary and sufficient condition 
for the ESP in product RCs with linear transfer function. 


III. Experiments 

In this section, we will compare the standard ESN, with 
linear and tanh activation functions, with the product RC. We 
will compare the performance of networks on computational 
capacity tasks and chaotic time-series prediction benchmarks. 


A. Reservoir Construction and Evaluation 


For our experiments, we use fully connected reservoirs with 
N nodes. The number of reservoir nodes N is adjusted for each 
task to get reasonably good results in a reasonable amount of 
time. The reservoir weights £2 and input weights CO are drawn 
from i.i.d. normal distribution with mean zero and standard 
deviation 1, i.e., jV{0, 1). The reservoir is then rescaled to have 
spectral radius A, while the input weights are multiplied by a 
coefficient ft). For the tanh and linear reservoirs, the reservoir 
nodes are initialized with Os, and for the product reservoirs 
they are initialized with Is. 


The reservoirs are driven with task-dependent input u, for 
2,000 time steps and the readout weights *P are calculated as 
described in Section [Il-A| using MATLAB’s pinv() function. 
For evaluation, the reservoir state is reinitialized and the 
reservoir is driven for another T = 2,000 time steps and the 
output y, is generated. For brevity, throughout the experiments 
section we adopt the subscript notation for the time index, e.g. 
y t instead of y(t). By convention, the system performance for 
computational capacity tasks is evaluated using the capacity 
function MC t , which is the coefficient of determination be¬ 
tween the output y t and the desired output y t \ 


MC t 


Cov 2 (yt,yt) 
Var(yr)Var(y f ) ’ 


(7) 







(a) product linear (b) linear (c) tanh 

Fig. 4: The linear memory capacity of the product RC and the standard ESN with linear and tanh activation functions for fixed 
input coefficient ft) = 0.2 and different A. As expected the more nonlinearity in the system the faster the memory curves decrease, 
i.e., tanh- ESN curves decrease faster than the linear ESN and the product RC faster than the tanh- ESN. 


where y, = yt(u t , T ) is a function of delayed input w r _ T and x is 
the memory length for the task. Total capacities are calculated 
by summing the capacity function over x: MC = J_ Z MC\. We 
use 1 < x < 50 for our empirical estimations. Note that for 
negative inputs to the product RC result in complex-valued 
outputs, capacities, and errors. Therefore, one must use the 
modulus of MC t and NMSE for the product RC. 


For the chaotic prediction task, the performance is evalu¬ 
ated by calculating the normalized mean-squared-error NMSE 
as follows: 


NMSE = 


\J T^J=o(yt -yt) 2 

Var(5y) 


( 8 ) 


where y t is the network output and y t is the desired output. 


B. Computational Capacity 

Computational capacity tasks consist of linear memory and 
nonlinear capacities, which measure how well an ESN can 
reconstruct a function of its previous inputs. In these sets of 
experiments reservoirs of size N = 20 nodes are driven with 
a one-dimensional input drawn from uniform distributions on 
(0,1]. We systematically choose the input weight coefficients 
and reservoir spectral radius in the ranges 0.001 < ft) < 1 and 
0.01 < A < 0.95. The intervals are chosen from preliminary 
experiments to capture the regions of the parameter space 
where we get the best results or variation in their trends for 
the purpose of sensitivity analysis. All the results are averaged 
over 50 runs. The desired output of memory capacity is defined 
below. 


that Legendre polynomials of different orders are orthogonal 
to each other, allowing one to measure the reservoir’s capacity 
to compute functions of varying degrees of nonlinearity inde¬ 
pendently from each other. The desired output of the Legendre 
polynomial of order n with delay x is given by: 

9M = i f Q i)»-*(« f _ t + i)*. (io) 

3) Results: Figure [4] shows the results of linear memory 
capacity experiments for different architectures and reservoir 
spectral radii. The input coefficient is fixed at ft) = 0.2. The 
.r-axis shows the time delay as a ratio x/N and the curves 
are averaged over 50 experiments. In general the product 
RCs show faster decrease in the MC t , due to the product 
nonlinearity. This is similar in nature to the effect that the 
saturated tanh activation function has on memory capacity. We 
then calculate the empirical total memory MC = Y. x =o MC X for 
different values of ft) and A. 



n t 


(a) total nonlinear capacity (b) nonlinear capacity function for 
n — 3 


1) Linear Memory Capacity: The linear memory capacity 
is a standard measure of memory in recurrent neural networks. 
The T-delay memory function MC X measures how long the 
network can remember its inputs. The desired output for this 
task is defined as: 

?t = u t - T . (9) 


Fig. 5: (a) Nonlinear computation capacity of ESN with prod¬ 
uct nodes, tanh- nodes, and linear nodes. With the exception of 
n = 3, the product RC outperform the tanh- ESN in nonlinear 
capacity, (b) shows the complete nonlinear capacity function 
for n = 3. The product network exhibits long-term memory 
whereas the tanh- ESN exhibits short-term memory. 


2 ) Nonlinear Computation Capacity: The nonlinear com¬ 
putation capacity measures the ability of the system to recon¬ 
struct a nonlinear function of its past inputs. Conventionally, 
Legendre polynomials are used to calculate the nonlinear 
computation capacity of the reservoir J9); their advantage is 


Figure [5a] summarizes the nonlinear computation capacity 
for 2 < n < 8. Product RC clearly shows useful nonlinear 
computation. However, a complete and fair comparison be¬ 
tween the nonlinear capacity of product RC and of tanh ESN 





























(a) product, linear memory 



bias to tanh ESNs, which is known to improve the nonlinear 
capacity of tanh reservoirs. 

Figure [ 6 ] shows the sensitivity analysis of memory and 
nonlinear capacity of both product RC and tanh ESN to input 
weight scaling and reservoir spectral radius. As expected, both 
product and tanh reservoirs perform best with high spectral 
radius and low input weight scaling. Next, we see how the 
product RC and the standard ESN compare in solving signal 
processing benchmarks. 





(e) product, nonlinear capacity, n = 5 (f) tanh, nonlinear capacity, n — 5 

Fig. 6 : Sensitivity analysis of the linear memory capacity and 
the nonlinear computation capacities of order n = 3,5. The 
product RC exhibit best memory capacity for high A and low 
ft), whereas for the tanh- ESN the optimal parameters depend 
on the type of memory measured. 


is beyond the scope of this work. First, t has already been 
reported that the tanh- ESN are unable to perform nonlinear 
memory tasks with even degrees |9j, but adding bias to the 
reservoir in these networks fixes this problem. In addition, our 
preliminary experiment shows that multiplicative readout layer 
in product RCs also significantly improves their performance. 
To understand the difference between the nonlinear capacity 
of the product network and the tanh- ESN for n 3, we must 
look at their capacity functions over T. Figure [5b] shows the 
3rd order capacity function for the three types of networks 
as a function of time. The capacity function of each network 
is chosen for the optimal parameter set of that network. The 
tanh- ESN can perfectly reconstruct the desired output for just 
a few recent inputs (short-term memory), while the product 
RC cannot reconstruct correct values perfectly, but it can do 
it with larger T, i.e., longer input histories. This behavior 
is analogous to high-quality short-term memory in recurrent 
networks operating in linear regime versus low-quality long¬ 
term memory in nonlinear regime |28| . The memory and 
nonlinear capacity results in this work do not consider the 
statistical significance test and only show qualitative features 
of product RCs and ESNs. For accurate estimation of the exact 
values one need to perform the measurement as described in 
0 - Also for simplicity we have not applied any reservoir 


C. Chaotic Time Series Prediction 


1) Mackey-Glass System Prediction: The Mackey-Glass 
system [ 32)] is a delayed differential equation with chaotic 
dynamics, commonly used as a benchmark for chaotic signal 
prediction. This system is described by: 


dx t 

dt 


= P 


*t-S 

1 -I- v" 

1 + x t-8 


-yx t , 


(ii) 


where fi = 0 . 2 ,n = 10 , and 7 = 0.1 are positive constants and 
5 = 17 is the feedback delay. The reservoir consists of N = 500 
nodes, and we systematically vary the input weight coefficients 
and the spectral radius in the range 0.1 < ft) < 1 and 0.1 < 
A < 0.9. The task is to predict the next T integration time 
steps given x t . We scaled the time series between [0,1] before 
feeding the network. 


2) Lorenz System Prediction: The Lorenz system is another 
standard benchmark task for chaotic prediction. The Lorenz 
system | [33|| is defined as: 


dx, 

-di =a{y ‘- Xt )’ 

dy t , , 

— =Xt (p- Z )-y t , 

dz t n 

-=x t yr-^ 


( 12 ) 


where j3 = 8/3, p = 28, and a = 10. These values give rise to 
chaotic dynamics, making the system a suitable benchmark for 
multi-dimensional chaotic time-series prediction. The reservoir 
consists of N = 500 nodes, and we systematically vary the 
input weight coefficients and the spectral radius in the range 
0.1 < ft) < 1 and 0.1 < A < 0.9. We feed all three variables to 
our systems, after scaling each variable on the interval [ 0 , 1 ]. 
The task is to produce the next T integration time steps for 
all three variables. We evaluate the performance NMSE to , by 
calculating NMSE for each output and adding them together. 

3) Results: Figure [7] shows the performance of the product 
and standard ESN in predicting the next time step of the 
time series. Product RC achieves comparable accuracy level 
as standard ESN with tanh activation (see Figures [7a] and 7b 1 . 
Similarly, for the Lorenz system both product and tanh- ESNs 
show a similar performance at their optimal parameters (see 
Figures [7d] and [7e]i. We have included the linear ESN for 
comparison. In our experiments, the parameters ft) = 0.1 and 
A = 0.8 are optimal for the product and the tanh- ESNs 
for both tasks. The full sensitivity analysis reveals that the 
product and the tanh- ESNs show task-dependent sensitivity 
in different parameter ranges. For example, for the product RC 
on the Mackey-Glass task, decreasing A increases the error by 











1 
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(d) product, Lorenz (e) tanh , Lorenz (f) linear, Lorenz 


Fig. 7: The performance of one step prediction of the Mackey-Glass and the Lorenz system. The best performance for product 
RCs are almost identical to the standard ESN with tanh activation function. The standard linear ESN is included for comparison. 


4 orders of magnitude, whereas the tank- ESN’s error increases 
by 5 orders of magnitude. On the other hand, the tank- ESN is 
robust to increasing A, while the product RC loses its power 
due to instability in the network dynamics. For the Lorenz 
system, high ft) and A destabilizes the product RC dynamics, 
which completely destroys the power of the system. However, 
the performance of the tanh- ESN does not vary significantly 
with changes in the parameter, because the Lorenz system does 
not require any memory. The linear ESN does not show any 
sensitivity to the parameter space. 

We then use the optimal values of the parameters to test 
and compare the performance of multi-step prediction of both 
Mackey-Glass and Lorenz systems. The task is for the system 
to produce the correct values for the systems x step ahead. 
Figure [8] shows the log 1 0 (NMSE ) for different x. The product 
RCs show performance quality similar to the standard tanh- 
ESNs. The standard ESN with linear activation is included for 
comparison. 

IV. Conclusion and outlook 

Nonlinearity of neural response is essential for real-time 
computational tasks that involve strong time variations of the 
input data. In this manuscript, we considered neural networks 
with a basic nonlinear property that a neuron outputs a 
weighted product of synaptic inputs. The presented modeling 
is an abstract representation of some type of neurons, whose 
response function is not fully captured by commonly used 
synaptic sums followed by a tanh thresholding function. We 
evaluated the performance of a neural network with such 
product units in the computational paradigm of reservoir com¬ 
puting. Product RCs were found to be comparably powerful 



(a) Mackey-Glass (b) Lorenz 


Fig. 8: The performance of multi-step prediction of the 
Mackey-Glass and the Lorenz systems, product RCs perform 
almost identically to the standard ESN with tanh activation 
function. The standard linear ESN is included for comparison. 


for nonlinear computation. For the nonlinear capacity task 
we have used the standard versions of product RC and tanh 
ESNs for simplicity. In our preliminary experiments we have 
observed that the use of bias in tanh ESNs and multiplicative 
readout layer for product RCs can significantly improve their 
performance. We defer a fuller analysis of these architectural 
variations to future work. On standard tests, we found that 
for Mackey-Glass and Lorenz systems prediction, a network 
of product units performs as well as a network of tanh units. 
For both types of nonlinear networks we found that the best 
performance is achieved with relatively small input weights, 
which does not take advantage of the full nonlinearity of the 
reservoir nodes. For tanh- networks, this nonlinear advantage 
will completely disappear for very small input weights. We will 























present a detailed study of this subtle behavior in a forthcoming 
paper f34) . Overall, our findings suggest that neural networks 
with product neurons may have stronger capacity than tanh 
neurons for certain real-time data processing tasks. 
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